Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase ip2me CIR/CBS for faster in-band file transfers #1000

Merged
merged 1 commit into from
Dec 10, 2019

Conversation

rajendra-dendukuri
Copy link
Contributor

Increase incoming packet rate on in-band interfaces to support faster
download of large files. SONiC firmware image download over in-band can
take a lot of time if the incoming packet rate is limited to 600pps. This,
change increases it to 6000pps. Especially when used by Zero Touch Provisioning
or by sonic_installer for firmware upgrade over in-band interfaces.

Signed-off-by: Rajendra Dendukuri [email protected]

What I did
Increased packet rate for traffic destined to CPU.

Why I did it
This is required for ZTP or sonic_installer to download large files (firmware image) over in-band interface.

How I verified it
sonic_installer install -y "url" over in-band network

Details if related

Increase incoming packet rate on in-band interfaces to support faster
download of large files. SONiC firmware image download over in-band can
take a lot of time if the incoming packet rate is limited to 600pps. This,
change increases it to 6000pps. Especially when used by Zero Touch Provisioning
or by sonic_installer for firmware upgrade over in-band interfaces.

Signed-off-by: Rajendra Dendukuri <[email protected]>
@rajendra-dendukuri rajendra-dendukuri changed the title Increase ip2me CIR/CBR for faster in-band file transfers Increase ip2me CIR/CBS for faster in-band file transfers Jul 29, 2019
@lguohan
Copy link
Contributor

lguohan commented Jul 30, 2019

retest this please

@@ -50,8 +50,8 @@
"queue": "1",
"meter_type":"packets",
"mode":"sr_tcm",
"cir":"600",
"cbs":"600",
"cir":"6000",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can make the cpu/kernel busy if there is any packets like ping destined to switch. I would assume it was set to 600pps to control such packets. How about setting it to 1000pps instead of increasing it 10x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code downloads are the pain point. With 1000pps we will see speeds ~ 1.4MB/s and it takes 7 minutes to download a full sonic image compared to 8.4MB/s with 6000pps. I think 6000pps is not that bad. Most of the switches are having an intel CPU and 8.4MB/s b/w for in-band access looks like a number on the conservative side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajendra-dendukuri what about keep the current value as 600 and leave the user to change the setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stcheng Are you suggesting that end user changes the setting in source code and re-compile's an image?

Copy link
Contributor

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to some issues related to SAI attribute property and the way copporch was implemented, we skip the reload of new copp json during warm reboot: https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-orchagent/swssconfig.sh#L44 while assuming that copp config won't change.

#582
#583

Probably it is time to revisit the assumption and find a proper fix.

@lguohan
Copy link
Contributor

lguohan commented Jul 31, 2019

agree with jipan.

@rajendra-dendukuri
Copy link
Contributor Author

So. What are the next steps for this?

@lguohan
Copy link
Contributor

lguohan commented Jul 31, 2019

we are changing the default value, and there is concern on the cpu impact. can you provide cpu utilization data for various pps limit.

what is the limit for knet driver rx per second? Can we get cpu utilization measurement for 3000/6000/10000 pps?

warm reboot is another concern, but we can address it in another pr.

@rajendra-dendukuri
Copy link
Contributor Author

rajendra-dendukuri commented Aug 6, 2019

When no policer installed, the stats that were observed before we start seeing drops
AS7712 24,821pps CPU: 27.72%
AS7816 37,792pps CPU: 14.45%

curl command CPU utilization when performing large file transfer over in-band
Intel Atom E3845 @ 1.91GHz, Quadcore
CIR/CBS 6000pps ; CPU 25%
CIR/CBS 3000pps ; CPU 14%
CIR/CBS 1000pps ; CPU 6%

we are changing the default value, and there is concern on the cpu impact. can you provide cpu utilization data for various pps limit.

what is the limit for knet driver rx per second? Can we get cpu utilization measurement for 3000/6000/10000 pps?

warm reboot is another concern, but we can address it in another pr.

@rajendra-dendukuri
Copy link
Contributor Author

Can we try to conclude on this? Without these changes, in-band code download doesn't make sense.

@rajendra-dendukuri
Copy link
Contributor Author

retest vs please

1 similar comment
@rajendra-dendukuri
Copy link
Contributor Author

retest vs please

@rajendra-dendukuri
Copy link
Contributor Author

test this please

Copy link
Member

@skg-net skg-net left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajendra-dendukuri instead of changing the default config, can this dynamically changed only during ztp process?

@rajendra-dendukuri
Copy link
Contributor Author

@rajendra-dendukuri instead of changing the default config, can this dynamically changed only during ztp process?

  1. Increasing in-band bandwidth to CPU goes beyond ZTP. It is applicable even to a simple case of download code via in-band network using "sonic_installer install" command. Also there may be more applications which may require more bandwidth that what is set to by default.

  2. We can have ZTP change the default value at runtime but it will also have to roll back the change. So if user changes the value to something else during ZTP process, the ZTP service may not be aware of it. It would have been easy if the CIR/CBS config was available in config db. However, that is not the case as we need to use a different copp.json file and a command to apply the changes.

Given above two reasons, I feel we should increase the default value to a reasonable agreed upon value.

@skg-net
Copy link
Member

skg-net commented Nov 27, 2019

@rajendra-dendukuri

@rajendra-dendukuri instead of changing the default config, can this dynamically changed only during ztp process?

  1. Increasing in-band bandwidth to CPU goes beyond ZTP. It is applicable even to a simple case of download code via in-band network using "sonic_installer install" command. Also there may be more applications which may require more bandwidth that what is set to by default.
  2. We can have ZTP change the default value at runtime but it will also have to roll back the change. So if user changes the value to something else during ZTP process, the ZTP service may not be aware of it. It would have been easy if the CIR/CBS config was available in config db. However, that is not the case as we need to use a different copp.json file and a command to apply the changes.

Given above two reasons, I feel we should increase the default value to a reasonable agreed upon value.

@rajendra-dendukuri instead of changing the default config, can this dynamically changed only during ztp process?

  1. Increasing in-band bandwidth to CPU goes beyond ZTP. It is applicable even to a simple case of download code via in-band network using "sonic_installer install" command. Also there may be more applications which may require more bandwidth that what is set to by default.
  2. We can have ZTP change the default value at runtime but it will also have to roll back the change. So if user changes the value to something else during ZTP process, the ZTP service may not be aware of it. It would have been easy if the CIR/CBS config was available in config db. However, that is not the case as we need to use a different copp.json file and a command to apply the changes.

Given above two reasons, I feel we should increase the default value to a reasonable agreed upon value.

@rajendra-dendukuri #2 we should block the user configuration during ZTP, since this will have other implication like loosing user configured data when doing ZTP disable.
#1 The value can be fine tuned based on the SONiC capabilities to handle the incoming rate. That value can be same or improved for ZTP scenario, since during that time no other protocols are running yet.

If value need to go beyond the SONiC defined default value, I prefer dynamically change based on the ZTP enable and disable.

@lguohan lguohan merged commit 03be983 into sonic-net:master Dec 10, 2019
abdosi pushed a commit that referenced this pull request Mar 4, 2020
Increase incoming packet rate on in-band interfaces to support faster
download of large files. SONiC firmware image download over in-band can
take a lot of time if the incoming packet rate is limited to 600pps. This,
change increases it to 6000pps. Especially when used by Zero Touch Provisioning
or by sonic_installer for firmware upgrade over in-band interfaces.

Signed-off-by: Rajendra Dendukuri <[email protected]>
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
Signed-off-by: Stepan Blyschak [email protected]

The motivation for this change is described in the proposal sonic-net/SONiC#935 and proposal in SAI opencomputeproject/SAI#1404
NOTE: Requires to update SAI once opencomputeproject/SAI#1404 is in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants